Downloaded from http://biorxiv.org/on September 18, 2014 



bioRviv 

f V beta 

THE PREPRINT SERVER FOR BIOLOGY 

Modelling reactions catalysed by carbohydrate enzymes 

Onder Kartal, Oliver Ebenhoh and Martin Steup 
bioRxiv first posted online September 1 , 2014 

Access the most recent version atdoi: http://dx.doi.org/10.1101/008615 



Creative The copyright holder for this preprint is the author/funder. It is made available under 
Commons a CC-BY-NC 4.0 International license. 
License 



Downloaded from http://biorxiv.org/on September 18, 2014 



Modeling Reactions Catalyzed by 
Carbohydrate- Active Enzymes 

O. Kartal, O. Ebenhoh, M. Steup 



Abstract Carbohydrate polymers are ubiquitous in biological systems and their 
roles are highly diverse, ranging from energy storage over mechanical stabilisa- 
tion to mediating cell-cell or cell-protein interactions. The functional diversity is 
mirrored by a chemical diversity that results from the high flexibility of how dif- 
ferent sugar monomers can be arranged into linear, branched or cyclic polymeric 
structures. Mathematical models describing biochemical processes on polymers are 
faced with various difficulties. First, polymer-active enzymes are often specific to 
some local configuration within the polymer but are indifferent to other features. 
That is they are potentially active on a large variety of different chemical com- 
pounds, meaning that polymers of different size and structure simultaneously com- 
pete for enzymes. Second, especially large polymers interact with each other and 
form water-insoluble phases that restrict or exclude enzyme diffusion. This hetero- 
geneity of the reaction system has to be taken into account by explicitly considering 
processes at the, often complex, surface of the polymer matrix. We review recent 
approaches to theoretically describe polymer biochemical systems. All attempts ad- 
dress a particular challenge, which we discuss in more detail. We emphasise a recent 
attempt which draws novel analogies between polymer biochemistry and statistical 
thermodynamics and illustrate how this parallel leads to novel insights about non- 
uniform polymer reactant mixtures. Finally, we discuss the future challenges of the 
young and growing field of theoretical polymer biochemistry. 
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1 Introduction 



Prokaryotic and eukaryotic cells synthesise a large number of chemically diverse 
polysaccharides (also designated as glycans) that consist of a large number of 
monosaccharide moieties linked by inter-sugar bonds. Chemical diversity includes 
both the sequence of monosaccharide residues and the type of the inter-sugar link- 
ages. As these linkages can be made to any hydroxyl group of the monosaccha- 
ride residues, both linear and branched glycans exist but, in terms of quantity, lin- 
ear structures (i.e. glycan chains) are by far dominant. Furthermore, within a given 
glycan the number of branching types is usually low. Polysaccharides exert many 
distinct biological functions, such as carbon and energy storage ( |59l , ll25l . [36 1), 
mechanical stabilisation of cells or tissues |8], cell-cell or cell-protein interactions 
(|41 1, 1 14]) and organelle division |58 1. In addition, glycans have attracted consid- 
erable (bio)technological interest because they are being used as starting materials 
or additives for many technological applications |46] and act as renewable energy 
source 1201 . 

Polysaccharides constitute the most abundant polymer type present in biotic sys- 
tems. As compared to the vast majority of proteins and nucleic acids, cells use, 
however, an entirely different mode to synthesise carbohydrate polymers. This pe- 
culiarity is due to two reasons: First, no general molecular equipment (functionally 
equivalent to the ribosome in protein biosynthesis) exists that is capable of forming 
any glycan molecule provided structural information is available. Second, (and sim- 
ilar to the rare cases of non-ribosomal peptide biosynthesis |44|) glycans are spec- 
ified by the kinetic properties of the glycan synthesising enzymes but not encoded 
by any non-carbohydrate system that is comparable to those in protein biosynthesis 
(the sequence in base triplets in genes and in their messenger RNAs). Due to the 
lack of appropriate enzymes, most of the theoretically possible diversity of glycans 
is not real in living systems. 

This mode of biosynthesis has several important implications. A large number 
of carbohydrate-active enzymes are required to synthesise complex glycans and all 
these enzymes need to be encoded in the genome. Carbohydrate-active enzymes 
often catalyse not a single reaction but rather perform a series of closely related 
reactions and repetitively act on a single glycan molecule. This implies that gly- 
can samples of natural origin usually do not consist of a single chemical species 
but rather are non-uniform]^ Despite sharing several chemical features, such as the 
building blocks (i.e. the monosaccharide moieties and/or their sequence) and the 
types of inter-sugar linkages, glycans in a non-uniform sample have different mo- 
lar masses or degrees of polymerisation (DP). As an example, the various soluble 
starch synthases exert distinct yet partially overlapping functions when synthesis- 
ing the various chains of the amylopectin molecule [5 |. Finally, carbohydrate-active 
enzymes often interact with small regions of the entire polysaccharide molecule. If 

1 We adopt a recent IUPAC recommendation and refer to samples as being uniform instead of 
monodisperse, a self-contradictory term, and non-uniform instead of polydisperse, a tautological 
term |43). 
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(as it is frequently the case) a given enzyme undergoes multiple interactions with 
the glycan, properties of the glycan-protein complex are largely determined by the 
avidity of this complex rather than by the affinity describing the interaction between 
a single carbohydrate-binding site and a single site of the target carbohydrate. The 
binding site can be closely related to the catalytically active site but, in many cases, 
is physically separated from the latter. 

For several reasons, this mode of action of many carbohydrate-active enzymes 
complicates the description and characterisation of these reactions. First, any em- 
pirical determination of the usual kinetic (K m and V max ) and thermodynamic (K eq ) 
parameters of the series of reactions is difficult as in most cases, any individual 
reaction cannot be separated from the others. From a theoretical point of view, an 
appropriate description of these reactions requires a large number of parameters and 
rate equations that, to a large extent, cannot be empirically determined. Further- 
more, the thermodynamical equilibrium of a series of related reactions is difficult 
to define. Finally, the enzymatic actions at the surface of insoluble substrates (such 
as native starch granules) can certainly not be interpreted in terms of the classical 
Michaelis-Menten equation or a more advanced rate law that assumes enzymes act- 
ing in homogeneous systems. These reactions take place in an inhomogeneous sys- 
tem and, therefore, essential parameters such as volume-based substrate or enzyme 
concentrations are not well defined or insufficient. Instead, structural features of the 
insoluble carbohydrate substrate(s) are highly relevant for the enzymatic actions. 

In the following, we summarise the current knowledge on carbohydrate-protein 
interactions and present a theoretical approach to appropriately describe reiterat- 
ing actions of carbohydrate-active enzymes on soluble and insoluble carbohydrate 
substrates. We do not consider kinetic features of protein complexes consisting of 
several enzyme activities. 



2 Challenges for modelling polymer systems 

The diversity of polymeric species and of the possible chemical transitions between 
them requires to consider more degrees of freedom than is usually the case in kinetic 
models. We discuss modelling strategies to address this combinatorial complexity. 

Another complication arises if intra- and intermolecular interactions of polymers 
leads to macroscopic boundaries like the starch granule interface. The heterogeneity 
that is introduced by these interactions has a profound influence on the enzymatic 
accessibility of parts of the substrate and thus on reaction rates. 



2.1 Soluble polysaccharides 

Kinetic models describe the state of a biochemical reaction system by introducing 
suitable state variables. These are usually copy numbers (X — X\ , . . . ,Xjy) or con- 
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centrations (x = X\, . . . ,Xn) of a fixed number N of individual chemical species. The 
state variables span the state space whose dimension is N. There are two potential 
problems with this approach when applied to glycan reaction systems. First, it is 
difficult to define all relevant species that make up the state space, since monomers 
can be combined in many different ways to form a vast number of diverse species. 
Second, in models of open systems at least, the choice of an upper limit on gly- 
can size is somewhat arbitrary since, for example, cogent information on a sharp 
limiting DP is missing. This can introduce artificial boundary effects in computer 
simulations that are not observed in reality. To circumvent these effects it is possi- 
ble to choose a very high maximum DP, such that the numerical error is very small 
compared to the measurement error under given experimental conditions. If that is 
not sufficient computer memory can be allocated dynamically during the simulation 
to extend the state space 'on the fly'. 

To get an idea of the combinatorial complexity of glycans we briefly discuss the 
possible number of structures for a (non-cyclic) polysaccharide of a given DP. 

The chemical structure of polymers depends essentially on the number and type 
of functional groups of the constituent monomers lfT6l . In the case of monosaccha- 
rides functionality is determined by the OH-groups through which carbon atoms 
of any two given monomers can condense. For example glucose, having functional 
groups at CI, C4 and C6 is considered a trifunctional monomer (although in dex- 
trans condensation via C3 and C2 has been reported as well). While bifunctional 
monomers only allow for linear sequences of monomers, polyfunctional monomers 
can account for nonlinear, branched structures like amylopectin. 

The notion of sequence is usually well-defined for linear polymers since both 
ends are chemically distinguishable by the functional group that is exposed (3'- 
and 5 '-end in nucleic acids or reducing- and non-reducing end in polysaccha- 
rides). However, the sequence of monomers is relevant only in cases where different 
monomers are combined in which case we speak of co- or heteropolymers, like 
DNA or heteroglycans. Unbranched homopolymers, derived from a single type of 
monomer, are sufficiently determined by their DP. 

Branched polymers are more complex than linear polymers in that they have at 
least two types of bonds and cannot be described anymore by a single sequence, 
much less by a single DP. For homoglycans, an experimentally accessible observ- 
able that conveys a better description than DP of the whole polymer is the DP distri- 
bution of individual branches which can be obtained by enzymatic hydrolysis. Still, 
this distribution only partly reflects differences in the distribution of branch points 
within the polymer. 

To illustrate the combinatorial complexity of polymers we focus on a-glucans, 
where trifunctional glucose monomers can be condensed by two types of O- 
glycosidic bonds, mainly a- 1,4 and a- 1,6. Mono- and divalent glucosyls form 
the majority and are typically only a-l,4-linked while those that are condensed 
at all three sites become branch points. Two classical examples are amylopectin and 
glycogen. The non-random, clustered occurrence of branch points in amylopectin 
leads to a different conformation and physical properties compared to glycogen, 
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where the branch points are more randomly distributed |54|. These structural differ- 
ences are correlated with their different physiological function. 

To perform the counting of possible glucan structures it is useful to model the 
class of glucans as a mathematical object called a graph, basically a set of nodes 
and edges endowed with a certain relationship (connectedness). This strategy, al- 
though ignoring three-dimensional features like conformation, has a long tradition 
in combinatorial chemistry, for example to model the carbon skeleton of alkanes 
by representing C atoms as nodes and C-C bonds as edges. There are many differ- 
ent types of graphs, but a suitable model for (acyclic) glucans is a so-called rooted, 
plane unary-binary treeQ Figure Q] shows all possible trees up to size DP-5, that is 
trees up to five nodes. The sequence in the table indicates that the number of pos- 
sible glucan structures grows enormously with increasing DP. This sequence, the 
so-called Motzkin numbers, is well-known in combinatorics as the solution to many 
different combinatorial problems (A001006 in the On-line Encyclopedia of Integer 
Sequences) and an exact formula can be given lfTTl[T5ll . 



1 V Y J 



Motzkin Sequence 


DP 


Structure Count 


3 


2 


4 


4 


5 


9 


6 


21 


7 


51 


8 


127 


9 


323 


10 


835 


15 


113,634 


20 


18,199,284 


25 


3,192,727,797 


30 


593,742,784,829 



Fig. 1 The combinatorial explosion of glucans. The figure shows all possible rooted and plane 
unary-binary trees up to DP-5. The table exemplifies the growth of the number of structures with 
DP as given by the Motzkin sequence (see text). 



Clearly, only a tiny subset of this overwhelming number of species is relevant 
in natural systems, and this is mainly due to enzymes that constrain possible tran- 
sitions. A structure that cannot result from enzymatic activity at a reasonable time 
scale can be safely ignored. Thus, restriction to enzymatically possible transitions 
allows us to reduce the state space. Nonetheless, enzyme activity can produce a 

2 This type of tree has a single designated root node, from which the whole tree emerges (in our 
case the reducing end of the glucan), at least one monovalent leaf node (the non-reducing ends) 
and di- or trivalent intermediate nodes. If a node has a single descendant or child we assume that 
it is always only one type of bond (here a-l,4-glycosidic). Only in the case of two children we 
distinguish between them, hence we have a planar or ordered tree. If we would like to distinguish 
two types of bonds in general we would have to model the glucan as a labelled tree and this would 
result in a different combinatorial problem. 
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highly non-uniform system even when the starting conditions are uniform. To il- 
lustrate the diversity of enzymatically catalyzed reactions, we consider state transi- 
tions that a single branched glucan can undergo in terms of changes in number of 
branch points k and monomers n. TableQ]summarises possible transitions and gives, 
for each type of transition, a carbohydrate-active enzyme (CAZyme) example from 
starch metabolism that can catalyse this transition under physiological conditions. 
Note, that the transitions referred to as grafting or cutting can be accomplished by 
different means, respectively. We can speak of grafting (Ak > 0, An > 0) if the 
glucan at hand is 

• condensed through an a-l,6-bond with a branched or unbranched glucan, or 

• condensed through an a-l,4-bond with a branched glucan. 

Likewise, we can speak of cutting (Ak < 0, An < 0) if the glucan at hand is 

• hydrolised at a a-l,6-bond (typically referred to as debranching), or 

• hydrolised at an internal a-l,4-bond, such that a branched glucan is removed. 



Transition Description CAZyme Example 

Ak = 0, An = 0 redistribution of branches or branch 4-a-glucanotransferase 



Ak> 0, An > 0 grafting with another (possibly branched) branching enzymes 
glucan 

Ak < 0, An < 0 cutting off a (possibly branched) glucan debranching enzymes, a-amylase 

Table 1 Reactions on polymers in terms of changes in number of branch points k and monomers 
n. 

There are in principal two approaches to model large-scale polymer systems, the 
individual-based approach and the continuous mixture approximation. 

The mechanistic, individual-based approach distinguishes each chemical species 
and formulates reaction rates for every reaction in the system. This leads to a set 
of differential equations that describes how the polymer composition changes over 
time. In deterministic models, concentrations describe the composition and the time 
evolution is determined by ordinary differential equations. The individual-based 
approach to polymer dynamics goes back to Smoluchowski in the deterministic 
case [42, 35 ). In probabilistic (or stochastic) models, each state, defined by the copy 
number of each component, is described by its probability. The differential equation 
describing the time evolution of these state probabilities is known as the chemical 
master equation (CME) [55, 2[. In practice, the CME is simulated by the stochas- 
tic simulation algorithm (SSA) [17| which "draws" individual trajectories for each 
species. 



lengths 

Ak = 0, An > 0 elongation of a branch 



starch synthases, a-glucan 
phosphorylase 

j8-amylase, a-glucan phosphorylase 
branching enzymes 
isoamylases, pullulanases 



Ak = 0, An < 0 shortening of a branch 
Ak > 0, An = 0 internal branching 
Ak < 0, An = 0 internal debranching 
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It is clear that the individual-based approach is only suitable if we know the sto- 
ichiometric matrix of the system and the parameters involved in each reaction. This 
complexity can be reduced by applying lumping techniques. In the case of polymers 
the principle of reaction shortsightedness ETI can be applied. This means that the 
reactivity of a polymer depends only on the local configuration of the reactive group 
but not on the size and shape of the molecule more than some number of monomers 
away. This allows to lump kinetic parameters. The same principle can be applied to 
binding where, as a rule, the probability of forming positional isomers depends on a 
limited number of participating monomer units ||5TI . that is only a small number of 
binding modes need to be distinguished. 

The continuous mixture approximation ignores the discrete nature of the compo- 
nents altogether and considers the time evolution of a concentration function c(x) 
that varies continuously with some descriptor x of the system, like temperature or 
weight. Given the enormous number of species present in a mixture, it is assumed 
that two adjacent species differ so little that their difference can be considered in- 
finitesimal, dx. Thus, the concentration of a polysaccharide G; with say DP xi is re- 
placed by c(xi)dx, the concentration of material with DP in the interval (xi,Xi+dx). 
The study of the dynamics of polymer distribution functions apparently goes back 
to De Donder and was further developed in particular by Aris ll3l l2T1l . 



2.2 Insoluble polysaccharides 

In the previous section, we ignored non-covalent interactions within and between 
polymers. Both lead to additional complexity in describing the state of a polymer 
or polymer mixture. To illustrate this briefly, we focus on a-glucans but point the 
reader to a detailed review (9) of structural aspects of starch for further details. 

In linear a-glucans one can observe conformational transitions between disor- 
dered coil and ordered helix states, depending on DP, temperature and solvent prop- 
erties |6|. These pure conformations are only extremes and within long polymers 
helical regions may be interrupted by less ordered (melted) regions. The single helix 
conformation is observed with amylose. Two helical a-glucans can form a double 
helix and several double helices can interact to form a crystalline phase. In starch 
the double helix is typically formed by adjacent amylopectin side-chains and the 
alignment of several double helices form the crystalline lamellae of the starch gran- 
ule. These double helices can align into different configurations, which are known 
as allomorphs. The crystalline lamellae can melt as well - a process which is linked 
to starch gelatinisation ll56ll . 

From the point of view of enzyme activity, the most interesting aspect of these 
intermolecular interactions is the formation of interfaces. The existence and explicit 
incorporation of interfaces complicates modelling since the enzymes become part of 
a heterogeneous reaction system where the reactants are part of a different phase and 
not entirely accessible. Any controlled mass transfer between the aggregated and the 
aqueous phase (e.g. starch synthesis and breakdown) requires enzyme diffusion to 
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and adsorption and activity at the interface. In some cases the enzyme can move on 
the substrate interface to act repetitively. 

In heterogeneous systems, the state variables depend on spatial coordinates if 
diffusion or convection are significant. The resulting reaction-diffusion equations 
are partial differential equations (PDEs) that are more difficult to handle analyti- 
cally and computationally than ODEs. The PDEs can be replaced by ODEs if the 
fast diffusion approximation is applied. This means that adsorption or binding at 
the interface is treated like the transport between two compartments. If adsorption 
is reversible and is assumed much faster than substrate turnover a further simplifi- 
cation is possible by using adsorption isotherms. The most well-known adsorption 
isotherm that has also been used for modelling surface-active enzymes is the Lang- 
muir isotherm. 



3 Overview of existing models 

The challenges discussed above that arise from the complexity of carbohydrate poly- 
mers such as starch have been addressed by various authors on different levels of 
complexity. 

Rollings 's review 1 39 1, apart from being a good general introduction into polymer 
degradation, gives an overview of deterministic modelling up to the mid 1980s. He 
discusses the differences between single-chain, multichain and multiple chain attack 
models, three action patterns that have been considered for a-amylase in pioneering 
studies by Robyt and French 1 37 38 1 . He further explains how endo-acting and exo- 
acting enzymes lead to different product distributions and how their joint activity 
could lead to synergistic effects. 

We will not review the models in 11391 but want to point out some common fea- 
tures and problems. All models assume that random fission of glycosidic links basi- 
cally follows Michaelis-Menten-like kinetics. Substrate multiplicity leads to inhibi- 
tion terms in the denominator of the rate laws and enzyme inactivation is occasion- 
ally considered. Nearly all of the models suffer from the aforementioned combina- 
torial explosion, either in the form of infinite sums over substrate concentrations or 
many different parameters. Complicated models may, under some circumstances, 
give a good fit to experimental data but the general insight gained from this exercise 
is usually very limited. The more complicated a model, the more difficult is, as a 
rule, its interpretation and a parameter set that gives an exceptionally good fit for a 
specific experiment can miserably fail under different conditions. 

For unknown reasons, Rollings omits some valuable theoretical work on depoly- 
merizing enzymes, especially Hanson |19] who derived an analytic rate expres- 
sion for short-chain cleavage, resulting in general rate laws including endo- and 
exohydrolases as special cases. Another early pioneer was John Thoma, who ex- 
tended Hanson's work by considering endwise cleavage of mixed polymer popula- 
tions and heteropolymers [52|, devised mathematical models to test different attack 
hypotheses for a-amylase Il50l|49 | and who's demonstration, together with Daniel 
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Koshland, that internal polymer segments inhibit j3 -amylase was important evidence 
for the induced-fit theory of enzyme catalysis Il53ll . 

With the similar goal to derive closed rate expressions, Chetkarov and Kolev [7| 
have applied the classical approach of Michaelis and Menten ll30l to enzyme- 
catalysed hydrolysis of homopolymers. An interesting result is their conjecture that 
the Michaelis constant Km decreases with increasing number of bonds of the sub- 
strate molecule. 

The comprehensive treatment by Thoma |52| on rate laws for enzymes degrading 
mixed polymers by endwise cleavage demonstrates rigorously how many different 
phenomena may be hidden in a rate law assuming the classical Michaelis-Menten 
form, but that the apparent parameters of maximal rate (V ma x) and Michaelis con- 
stant (Km) depend in general on a large variety of phenomena occurring at the 
molecular level. While in some cases the direction of the effect can be predicted 
(e.g. competitive self-inhibition will always lower the apparent Km while the de- 
gree of repeated attack will always increase it), in many cases the effect on apparent 
values cannot be predicted a priori. Both, apparent maximal rate and Michaelis con- 
stant, depend in general on heterogeneity in monomer composition, the variation in 
the type of polymer distribution and on enzymatic properties such as competitive 
self-inhibition, the degree of repetitive attack and the occurrence of multiple inter- 
mediates. 

Derivation of explicit rate laws is only possible for relatively simple systems. It is 
therefore not surprising that these approaches remained limited to bond hydrolysis, 
where the elementary reaction works on a single substrate molecule. A noteworthy 
approach is therefore the work by Mulders and Beeftink l3TI who derived analytic 
expressions for enzymatic polymerisation and could show how the resulting chain 
length distribution depends on the Michaelis constant for the elementary elonga- 
tion reaction. As a general tendency it could be shown that a Michaelis constant 
lower than the substrate concentration (indicating high substrate saturation) leads to 
narrower distributions of the degree of polymerisation. 

A more complex treatment is required for example for transglycosidation reac- 
tions, important reactions in the turnover of starch. Here a reversible bi-bi mecha- 
nism must be assumed, and both substrate molecules are of an unspecified length. 
To treat such enzymes, Hiroshi Nakatani employed Monte Carlo simulations and 
described in a series of papers various models of the action of enzymes on solu- 
ble carbohydrate polymers. In |32ll22l a model of the action of j3 -amylase is pre- 
sented, which takes into account the possibility of repeated attack of the enzyme 
without dissociation of the substrate. In ll33l a conceptually similar model is dis- 
cussed which describes the action of a-glucanotransferases, which include central 
enzymes in the starch degradation pathway such as DPE1. Finally, a model of the 
enzyme hyaluronidase is discussed in f34l . While the carbohydrate hyarulonan is 
not present in plants, the action of hyaluronidase shares common principles with 
many starch degrading enzymes. A particularly interesting aspect of hyaluronidase 
is that its possible catalytic activity includes transglycosidation, as well as conden- 
sation and hydrolysis. In so far, the presented model can serve as a prototype also 
for isolated multi-enzyme systems. All these models are limited to soluble carbohy- 
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drates and describe in vitro systems containing a single enzyme, and the simulation 
technique is a simple Monte Carlo simulation. 

Stochastic models have also been applied to simulate the complex soluble struc- 
ture of amylopectin. In [28, 27 1, Marchal and coworkers model amylopectin as a 
matrix and demonstrate how this is used in Monte Carlo simulations to compute 
sugar release. This model is highly illustrative and useful to investigate how molec- 
ular mechanisms determine overall characteristics of long polymers such as chain 
length distribution and branching patterns. In 11271 the authors applied their model 
to evaluate various suggested subsite patterns of a-amylase and found that the in- 
clusion of specific inhibition terms improved the predicting power of the model. 

Besides its non-uniform composition, the insoluble nature of starch poses a ma- 
jor challenge for any theoretical description of starch degrading or synthesising pro- 
cesses. 

McLaren and Packer [29 1 summarise important earlier attempts to model enzy- 
matic reactions in various heterogeneous systems including the action of soluble en- 
zymes on carbohydrates like cellulose, chitin and starch. Rollings's [39] and Zhang 
and Lynd's ll60l reviews focus on cellulose degradation but their observations re- 
garding the influence of adsorption and surface properties (specific surface area and 
surface states) are valid also for starch. In particular, both insoluble cellulose and 
starch have ordered and less ordered interfaces that are differently susceptible to- 
wards enzymatic attack and intercompartmental mass transfer. Bansal et al. [4] is 
another more recent review on models of cellulases. 

To our knowledge the only review dedicated to kinetic models of starch degrada- 
tion exclusively is by Dona et al. iflOl . It gives an impression of classical determinis- 
tic approaches geared towards biotechnological applications rather than fundamen- 
tal considerations. Here we would like to emphasise models that focus on principle 
features or give an interesting perspective on the general problem of heterogeneous 
catalysis in its entirety. 

An early attempt to simulate degradation of insoluble substrate was made by 
Suga and coworkers [45| where they describe the degradation of an insoluble cross- 
linked dextran by a dextranase from Penicillium funiculosum. Their rather elaborate 
model takes into account the transport of enzymes and soluble products through 
pores in the insoluble substrate. Tatsumi and Katano [47 48 1 developed a rate ex- 
pression for the enzymatic surface hydrolysis of raw starch by glucoamylase. Their 
results illustrated the importance of including the specific surface area into any rate 
equations of surface-active enzymes and they have systematically validated their 
rate expressions with raw starch granules from different botanical sources having 
different size distributions. 

Building on such results, Kartal and Ebenhoh [23 1 have systematically derived 
a generic rate law for surface-active enzymes, which can be applied to enzymatic 
processes at the surface-bulk interface and can easily be generalised for specific en- 
zymatic mechanisms. The authors demonstrated how different adsorption isotherms 
can be used to derive the enzyme kinetics and, due to the generality of their ap- 
proach, could explain how different assumptions and different adsorption models 
influence the kinetic parameters, in particular the apparent Michaelis and maximal 
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rate constants. Also, in agreement and extension to the previous approaches men- 
tioned above, the generic rate law provides a quantitative relation between important 
experimental parameters, such as particle size distribution and specific surface area, 
and the apparent kinetic parameters. 

An insightful mechanistic model has been presented by Levine et al. j|26ll . This 
model is purely based on standard ordinary differential equations and is applied 
to the degradation of cellulose, but it contains a number of interesting features of 
general applicability: The model adapts a procedure to map a collection of arbitrarily 
shaped three-dimensional objects to spheres while preserving total area, volume and 
hydrolysis rate. This procedure, in conjunction with random sequential adsorption 
(RSA) simulations, allows the authors to infer an effective footprint of cellulose 
degrading enzymes and to calculate that the surface-active enzyme occupied nearly 
twice as much surface as its physical footprint would suggest. 

Another interesting aspect is addressed by the model of Fenske and cowork- 
ers lfl3ll simulating the action of a glycosidase from Cellulomonas fimi as an exam- 
ple for an enzyme which is both exo- and endo-acting in the degradation of insoluble 
polysaccharides. With a Monte Carlo approach simulating random endo- and exo- 
attacks on a two-dimensional array, which models a surface, the authors particularly 
investigate whether one enzyme alone can achieve synergism, so-called autosyn- 
ergism. The simulations suggest that for autosynergism the enzymes should be in 
close vicinity to each other on the substrate surface. However, this raises the im- 
mediate question how autosynergism might be achieved while avoiding crowding 
and jamming of enzymes at the surface, suggesting that for autosynergism to occur 
further regulatory mechanisms might be required. 

To confuse matters even further and to illustrate the vast amount of specific de- 
tails which must be considered when theoretically describing carbohydrate polymer 
biochemistry, the work of Xu and Ding l57l is worth mentioning, in which they 
have shown that non-Fickian diffusion, resulting from small confined spaces and 
crowding, leads to fractal (i.e. non-integer) kinetic orders in Michaelis-Menten like 
rate laws. This theory was applied specifically to the catalytic action of cellobiohy- 
drolase. 



4 The entropic approach to polymer biochemistry 

The review of the numerous theoretical approaches to simulate and understand the 
vastly complex process of biosynthesis and degradation of insoluble carbohydrate 
polymers demonstrates that many different peculiarities have to be considered which 
are not important when building mathematical models of classical pathways in the 
aqueous phase. Most important are the correct description of surface-active enzymes 
and a suitable representation of polymeric structures. Especially the latter is a novel 
theoretical challenge, because clearly a straight-forward enumeration of all possible 
structures is neither practical nor insightful due to the combinatorial explosion in 
theoretically possible structures. 
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As a consequence, various Monte Carlo based modelling approaches have been 
applied to simulate the action of polymer-active enzymes [ 32 33 34 , 22 , 28 27 13 ] 
(see review in the previous section). Such approaches have the advantage that not 
all possible configurations have to be known a priori. However, in many of these 
simulations, the temporal progress of the catalytic action was simulated only as a 
function of the reaction coordinate (see e.g. [32 33 34]) and it was difficult to relate 
the substrate formation with the actual time passed. In [24] and [40 1 the Monte-Carlo 
approach was slightly modified and simulated using a Gillespie algorithm 1171 . al- 
lowing to explicitly include the time coordinate. As a result, the temporal progress 
of various glucanotransferases and a plastidial phosphorylase could be reproduced 
and explained by the enzymatic mechanisms and an extremely good match to ex- 
perimental in vitro data was obtained. 

However, while stochastic simulations can provide a realistic temporal repre- 
sentation of the action of polymer-active enzymes, they are always limited to the 
particular case (including the starting conditions) to which they are applied. A more 
general theory explaining the action of non-substrate specific enzymes acting on 
polymers with arbitrary length in a wider context has recently been proposed by 
Kartal, Ebenhoh and coworkers Il24l Q~2) . 

The hallmark of the developed theory is that it accepts the fact that the number 
of specific reactions catalysed by polymer-active enzymes, such as glucanotrans- 
ferases, is in principle infinite and so are the different chemical structures which 
may serve as substrate. It then draws parallels between biochemical systems with 
non-uniform polymer composition and canonical ensembles in statistical thermo- 
dynamics and arrives at the conclusion that polymer-active enzymes are driven by 
a combination of release of enthalpy and an increase in the mixing entropy of the 
polymer solution. In other words, polymer-active enzymes tend to maximise the dis- 
order by creating a maximally mixed state of different chain lengths. In the interest- 
ing special case of the glucanotransferases DPE1 and DPE2, the change of enthalpy 
is zero [18| and consequently the increase in mixing entropy is the only driving 
force of these enzymes. The theoretically predicted equilibrium distributions have 
been verified experimentally in [ 24 1 with extremely high accuracy. 

The underlying conceptual idea which allows to apply principles from statistical 
thermodynamics to biochemical systems acting on non-uniform polymer mixtures 
is that different chain lengths (or degrees of polymerisation, DP) are identified with 
different energy states of a molecule. In a case of a simple unbranched chain with 
a single type of bond, the enthalpies of the bonds linking the monomers correspond 
to the energy state of the whole molecule. If, for example, the bond enthalpy is E, 
the energy state of a linear polymer consisting of n monomers equals E n = (n — 1) ■ 
E. Thus, all possible configurations (chains of length 1,2,...) are represented by 
equidistant energy states with the ground state E\ = 0 corresponding to a monomer. 

As a prototype consider the reaction catalysed by disproportionating enzyme 1 
(DPE1), a plastid-located glucanotransferase involved in starch metabolism. This 
enzyme catalyses the transfer of 1, 2 or 3 glucosyl residues from one unbranched 
malto-oligosaccharide to another, a reaction that can be written as 



Downloaded from http://biorxiv.org/on September 18, 2014 



Modeling Reactions Catalyzed by Carbohydrate- Active Enzymes 13 

G n +G m ==^G n _ q + G m+q with ?=1,2,3, (1) 

where G k denotes an unbranched, a- 1,4 linked glucan. In this special case, the catal- 
ysed reactions do not change the overall enthalpy, because in every catalytic step one 
bond will be opened and another one closed, and, moreover, the bond enthalpy is 
independent on DP and position within the polymer [18|. The action of DPE1 in 




Reaction step 



Fig. 2 Scheme of the DPE1 mediated reaction system in the statistical thermodynamics picture. 
The energy states Ej correspond to a- 1,4 linked glucans of DP k. DPE1 mediates transfers of 
glucose, maltose and maltotriose units, i.e. q = 1,2,3. In each reaction step the system follows an 
arbitrary dashed and solid arrow of the same colour simultaneously. This leads to a combinatorial 
explosion of the reaction system. 



the thermodynamic picture is schematically depicted in Fig. [2] for a case in which 
the reaction is initialised with a uniform solution of glucans of DP 8. According to 
Eq. (Q]), the catalytic action of DPE1 on a statistical ensemble of energy states cor- 
responds to the simultaneous downward shift of one molecule from an energy state 
E n to E n - q (donor reaction) and an upward shift of another molecule from energy 
state E m to E m+q (acceptor reaction), where q — 1 , 2, 3. 

In statistical thermodynamics, the principle of maximum entropy provides the 
method to calculate the equilibrium distribution of the occupation of each energy 
state depending on the total energy within the system. In physical systems (such as 
gases or atomic ensembles), the total energy is given by the temperature of the sys- 
tem. In a biochemical system in which the total enthalpy is conserved (such as for 
DPE1), the total energy is given by the (conserved) total number of bonds between 
monomers in the system. This quantity can be controlled through the experimen- 
tal starting conditions. If, for example, an in vitro assay of DPE1 is incubated with 
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maltopentaose molecules only, there are b = 4 bonds per molecule and this average 
number will remain constant in time. Thus, in this particular scenario, the average 
bond number is analogous to the temperature in physical systems and all known 
formulae from statistical thermodynamics can directly be applied to determine the 
equilibrium distribution of the enzymatic system. So, for polymer biochemical sys- 
tems, the principle of entropy maximisation states that the action of an enzyme will 
continue until the distribution of DPs within the solution is maximally mixed, as is 
characterised by a maximal value of the mixing entropy 

Smix = -£**ln*ifc, (2) 

where denotes the molar fraction of a polymer of DP k. A short calculation to 
determine the maximal value of 5 m ; x under the constraints that the total number 
of molecules is conserved (Y,Xk = 1) and the total number of bonds is conserved 
(£(k— 1) • jcjt = b) leads to the prediction that in equilibrium the molar fractions of 
the different DPs are distributed as 

Xkote-W, (3) 

where Xk is molar fraction of molecules with DP k, and j3 = ln((Z> + l)/b) , and b the 
average bond per molecule. 

This formula quantifies precisely how the equilibrium distribution depends on 
the initial conditions and, moreover, leads to a conceptual advance by introducing a 
novel constant, j3, which is a generalisation of the classical equilibrium constant for 
the case of non-uniform polymeric systems. 

While in the case of DPE1, the theory is presented in its simplest form and the 
analogies between polymer biochemistry and statistical thermodynamics become 
most clear, it is of general validity and thus applicable to a wide spectrum of sys- 
tems, as was demonstrated in [24 12, 40 1. Besides providing experimental evidence 
for the soundness of the theoretical concepts, Kartal et al. Il24l proved in the ac- 
companying supplementary information that the thermodynamic formulae can be 
deductively derived from first principles in the case of a mixture of dilute solutions. 
This derivation is highly illustrative because it shows the mathematical formulation 
in its most general form for an arbitrary biochemical reaction system. 

For the generalisation of the theory, various issues have to be taken into account. 
In many cases enthalpies are not conserved. For example, the reaction catalysed by 
phosphorylase, which transfers the non reducing glucosyl residue of an unbranched 
glucan to an orthophosphate according to the formula 

G^P^G^+GIP, (4) 

opens an a- 1,4 glucosidic linkage while closing a phosphoesther bond with quite 
a different bond enthalpy. In such a case, the biochemical system is analogous to a 
closed system in statistical physics, rather than an isolated system as was the case for 
DPE1. From the analogies it follows directly, that now an energetic and an entropic 
term have to be considered. The progress of the biochemical reaction (HJ from left 
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to right will lead to a release of enthalpy and concomitantly to a combination of 
reactants with a lower total Gibbs energy of formation. Kartal et al. j|24ll have shown 
that the correct way to predict the equilibrium distribution is by minimising the 
Gibbs free energy of the system which is related to the mixing entropy by 

G = G f -T-S miK , (5) 

where is the total summed Gibbs energy of formation of all reactants and T is 
the temperature. 

Another important aspect is to consider possible additional constraints imposed 
by the enzymatic mechanisms. Disproportionating enzyme 2 (DPE2), a cytosol- 
located glucanotransferase involved in maltose metabolism, catalyses a reaction ac- 
cording to the formula 

Gn+Gm^G n -l+ G m+l with n^3,m^2, (6) 

with the critical limitation that maltose cannot act as an acceptor (m ^ 2) and mal- 
totriose can never act as donor molecule (« ^ 3). This limitation results in the addi- 
tional constraint that the sum of the molar fractions of glucose and maltose is con- 
stant (x\ +X2 — m). Performing the entropy maximisation as in the case of DPE1, 
but with this additional constraint considered, leads to the correct prediction of the 
equilibrium distribution, as was also experimentally demonstrated in Il24ll . The dif- 



o 




Reaction step 



Fig. 3 Scheme of the DPE2 mediated reaction system. Each DPE2 reaction step consists of one 
donor and one acceptor reaction depicted by a dashed and a solid arrow, respectively. Due to 
the restriction that maltose is never an acceptor and maltotriose is never a donor, the maltose 
and glucose pool is separated from the other DPs as shown by the horizontal dashed line. The 
scheme exhibits all possible reaction pathways starting from the two indicated initial substrates 
maltohexaose and maltose, where in each step one arbitrary solid and one arbitrary dashed path is 
taken. 
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ference to the case of DPE1 is illustrated in Fig. [3] Similar to DPE1, the catalytic 
action of DPE2 corresponds in the thermodynamic picture to a simultaneous oc- 
currence of an arbitrary donor reaction (dashed arrows) and one arbitrary acceptor 
reaction (solid arrows). However, since maltose cannot act as an acceptor and mal- 
totriose cannot act as donor, no arrow can cross the dashed line, resulting in a sep- 
aration in two pools, one containing glucose and maltose molecules and the other 
all longer glucans. Fig.[3]illustrates the case in which the reaction is initialised with 
maltose and maltohexaose molecules only. 

All the systems discussed above conserve the total number of reactants, because 
all elementary reactions are reversible bi-bi reactions, consuming two molecules 
and producing two molecules. In principle, the theory is also applicable to systems 
not conserving the number of molecules, but in this case the difficulty arises that the 
ratio between concentrations and molar fractions is no longer constant. This leads to 
some changes in the formulae, as has been derived and laid out in ll24l[T2l . However, 
the theory has up to now not been applied and verified experimentally for systems 
not conserving the total number of molecules. 

Although the theory in its present form only allows to make precise predictions 
about the equilibrium states, and living systems are always far from equilibrium, the 
theoretical concepts nevertheless provide significant insight into the principles how 
polymer-active enzymes work. Firstly, knowledge of the equilibrium is, as for clas- 
sical enzymes, prerequisite to determine in which direction a reaction will proceed 
and to evaluate how far from equilibrium an experimentally determined physiolog- 
ical state actually is. Moreover, Ebenhoh et al. Ifl2ll have shown how the theoretical 
knowledge can be used to indirectly determine bond enthalpies from measured equi- 
librium distributions. 



5 Open problems and conclusions 

Biosynthesis and degradation of complex and insoluble polymers, such as starch or 
cellulose, are multi-faceted processes which are challenging to describe in math- 
ematical models and simulations. What makes the description of these processes 
more difficult than modelling classical pathways which occur in bulk solutions are 
mainly two facts: 1) the insoluble nature and large size make it necessary to distin- 
guish between enzymatic processes occurring in solution and those taking place at 
the, probably very complicated, surface of the substrate; 2) the unlimited flexibility 
in combining monomers into long polymers leads to a combinatorial explosion in 
the numbers of theoretically possible molecular structures that appear as reactants in 
the biochemical pathways. Starch synthesis is an excellent example in which inho- 
mogeneous phases and soluble polymer systems both play a central role. Therefore, 
developing a comprehensive model of starch synthesis confronts us with both of 
these challenges simultaneously. 

The various approaches to theoretically describe and simulate processes on sub- 
strate surfaces are very promising and it appears that the difficulties to include 
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surface-active enzymes into pathways models can soon be overcome, in particu- 
lar thanks to early pioneering work 11431 and the development of more and more 
general rate laws for surface-active enzymes ll47l l48l l23l . On the theoretical and 
modelling side, the key issues here will be to derive simplified but sufficiently accu- 
rate descriptions of the insoluble reactants and their surfaces, and to make plausible 
assumptions over the different adsorption models of the involved proteins, to simu- 
late adequate available area functions, which are the key to correctly represent com- 
petition and crowding effects on the substrate surface. However, also experimental 
efforts are necessary to support the development of comprehensive pathway models. 
As was demonstrated in the various theoretical works developing surface-active rate 
laws, interpretation of in vitro data has to be performed with great care. Apparent 
turnover rates and Michaelis constants depend on a number of factors, which do not 
have to be considered for enzymes acting in solution. The specific rate, for exam- 
ple, decreases with increasing enzyme concentration and, moreover, depends also 
on the presence of other enzymes acting on the same surface. Such a dependency 
on enzyme concentrations does not exist in bulk solution. Further, the specific rate 
increases with increasing specific surface area, and the Michaelis constant increases 
with decreasing specific surface area and with increasing total enzyme concentra- 
tion ll23ll . Consequently, in vitro experiments, in which the controllable parameters 
such as specific surface area and enzyme concentrations are systematically varied, 
are necessary to parameterise the generic rate equations. 

Major challenges are posed for the simulation, and even more for the theoreti- 
cal understanding, of biochemical reactions in non-uniform and complex reactants. 
Despite the recent progress by successfully finding analogies between polymer bio- 
chemical systems and statistical ensembles j|24l . it is apparent that a further de- 
velopment of the theoretical concepts is still necessary. Whereas it is now under- 
stood how equilibrium distributions generated by non substrate-specific polymer- 
active enzymes can be explained and predicted, the biologically relevant far-from- 
equilibrium steady states still evade a prediction from first principles. It is therefore 
evident that a focus of future theoretical activities in polymer biochemistry research 
must lie in an advancement of our fundamental understanding which factors govern 
the dynamics of polymer-active enzymatic processes. With the established parallels 
between polymer biochemistry and statistical thermodynamics it is well possible 
that both scientific fields will mutually benefit from each other, because experimen- 
tally observed dynamics in simple in vitro systems can now in principle conversely 
be employed to draw conclusions about non-equilibrium thermodynamic physical 
systems. Non-equilibrium thermodynamics is still a very active area of research and 
the analogies should be further elaborated to ensure that novel insights gained in 
physics can lead to a deeper understanding of polymer biochemistry. 

Notwithstanding our lack of understanding of far-from-equilibrium states, Monte- 
Carlo simulations il32l [33 . 34 1 and Gillespie algorithms [ 24 40 1 allow for a precise 
prediction and explanation of the temporal evolution of polymer mixtures in in vitro 
experiments towards equilibrium. Our new theoretical understanding of polymer- 
active enzymes allows us now to define adequate kinetic parameters. In order to 
transgress from mimicking relatively simple in vitro systems to the biologically 
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more interesting in vivo situation it will now be necessary to integrate the theo- 
retically supported stochastic simulations into whole pathway models. Here, several 
difficulties can be expected. If, as is commonly the case, reactions in solution are 
modelled by ordinary differential equations, then the integration of these two fun- 
damentally different types of simulation techniques is far from trivial (TJ. If, on the 
other hand, also all reactions in bulk solution were to be modelled by stochastic 
simulations, a considerable increase in computation time can be expected and it is 
questionable whether such a pathway model could still be simulated on a standard 
desktop PC in reasonable time. 

In conclusion, the recent advancements in theory building and model develop- 
ment regarding carbohydrate polymer metabolism have addressed all major prob- 
lems and difficulties and solutions for most aspects have been proposed. We are 
therefore currently in the exciting situation that the single pieces and building blocks 
are at hand - at least in a prototype form - but that comprehensive mathematical 
models combining the various approaches, in order to simulate for example the syn- 
thesis of a starch granule, do not yet exist. One focus of theoretical research in carbo- 
hydrate metabolism should therefore lie in the development of integrative pathway 
models, in which processes at surfaces and in solution are combined and polymeric 
diversity is adequately represented. These models can then serve as valuable tools 
to first reproduce in simulations complex processes as starch granule synthesis and 
maturation, and later query in in silico experiments the effect of genetic and envi- 
ronmental perturbations in order to arrive at a comprehensive understanding how 
physiological regulation is accomplished with highly heterogeneous and disperse 
components. 
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